Multitime scale Markov decision processes

نویسندگان

  • Hyeong Soo Chang
  • Pedram Jaefari Fard
  • Steven I. Marcus
  • Mark A. Shayman
چکیده

This paper proposes a simple analytical model called M time-scale Markov Decision Process (MMDP) for hierarchically structured sequential decision making processes, where decisions in each level in the M -level hierarchy are made in M different time-scales. In this model, the state space and the control space of each level in the hierarchy are non-overlapping with those of the other levels, respectively, and the hierarchy is structured in a “pyramid” sense such that a decision made at level m (slower time-scale) state and/or the state will affect the evolutionary decision making process of the lower level m + 1 (faster time-scale) until a new decision is made at the higher level but the lower level decisions themselves do not affect the higher level’s transition dynamics. The performance produced by the lower level’s decisions will affect the higher level’s decisions. A hierarchical objective function is defined such that the finite-horizon value of following a (nonstationary) policy at the level m+ 1 over a decision epoch of the level m plus an immediate reward at the level m is the single step reward for the level m decision making process. From this we define “multi-level optimal value function” and derive “multi-level optimality equation”. We discuss how to solve MMDPs exactly or approximately and also study heuristic on-line methods to solve MMDPs. Finally, we give some example control problems that can be modeled as MMDPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unifying temporal and organizational scales in multiscale decision-making

In enterprise systems, making decisions is a complex task for agents at all levels of the organizational hierarchy. To calculate an optimal course of action, an agent has to include uncertainties and the anticipated decisions of other agents, recognizing that they also engage in a stochastic, game-theoretic reasoning process. Furthermore, higher-level agents seek to align the interests of their...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Multi-Time-Scale Markov Decision Processes for Organizational Decision-Making

Decision-makers in organizations and other hierarchical systems interact within and across multiple organizational levels and take interdependent actions over time. The challenge is to identify incentive mechanisms that align agents’ interests and to provide these agents with guidance for their decision processes. To this end, we developed a multiscale decision-making model that combines game t...

متن کامل

ADK Entropy and ADK Entropy Rate in Irreducible- Aperiodic Markov Chain and Gaussian Processes

In this paper, the two parameter ADK entropy, as a generalized of Re'nyi entropy, is considered and some properties of it, are investigated. We will see that the ADK entropy for continuous random variables is invariant under a location and is not invariant under a scale transformation of the random variable. Furthermore, the joint ADK entropy, conditional ADK entropy, and chain rule of this ent...

متن کامل

Extended Geometric Processes: Semiparametric Estimation and Application to ReliabilityImperfect repair, Markov renewal equation, replacement policy

Lam (2007) introduces a generalization of renewal processes named Geometric processes, where inter-arrival times are independent and identically distributed up to a multiplicative scale parameter, in a geometric fashion. We here envision a more general scaling, not necessar- ily geometric. The corresponding counting process is named Extended Geometric Process (EGP). Semiparametric estimates are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Automat. Contr.

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2003